Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers

نویسندگان

چکیده

Abstract Autograding short textual answers has become much more feasible due to the rise of NLP and increased availability question-answer pairs brought about by a shift online education. performance is still inferior human grading. The statistical black-box nature state-of-the-art machine learning models makes them untrustworthy, raising ethical concerns limiting their practical utility. Furthermore, evaluation autograding typically confined small, monolingual datasets for specific question type. This study uses large dataset consisting 10 million from multiple languages covering diverse fields such as math language, strong variation in answer syntax. We demonstrate effectiveness fine-tuning transformer complex datasets. Our best hyperparameter-tuned model yields an accuracy 86.5%, comparable that are less general tuned type question, subject, language. More importantly, we address trust concerns. By involving humans process, show how improve automatically graded answers, achieving equivalent teaching assistants. also teachers can effectively control errors made system they validate efficiently autograder’s on individual exams close expected performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-lingual Multi-media Information Retrieval System

Aiming to realize such a system, we build the prototype MLMMIR system, called BOSS [7]. BOSS can retrieve a mixture of various kinds of useful data by first translating the contents that are included in lingual and media data into Japanese keywords, and then providing them to the Japanese IR engine that we developed. The performance of the system depends heavily on the Japanese IR engine, and t...

متن کامل

Towards a multi-lingual workflow system a practical outlook

Due to rapid development in the global market, workflow systems are not limited to a single country. Electronic business, like workflows span across countries and hence there arises the need for understanding among the users of the system to operate/use them in their own local language. For software such as workflow management systems it is highly imperative that it should be internationalized ...

متن کامل

Multi Lingual Sequent

We study a Gentzen style sequent calculus where the formulas on the left and right of the turnstile need not necessarily come from the same logical system. Such a sequent can be seen as a consequence between diierent domains of reasoning. We discuss the ingredients needed to set up the logic generalized in this fashion. The usual cut rule does not make sense for sequents which connect diierent ...

متن کامل

Multi-lingual Prosodic Processing

In our previous research, we have shown that prosody can be used to dramatically improve the performance of the automatic speech translation system VERBMOBIL [9]. The methods to classify prosodic events have been developed on the German sub-corpus of the VERBMOBIL speech database. In this paper we describe how the methods that we developed on the German sub-corpus can be applied to other langua...

متن کامل

Multi-lingual duration modeling

Controlling timing in text-to-speech synthesis systems is complicated, because there are many contextual factors that affect timing; moreover, which factors matter and what their precise effects are varies among languages. We describe here a language-independent approach for duration control. At run time, a language-independent timing module accesses languagespecific tables. These tables specif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Artificial Intelligence in Education

سال: 2022

ISSN: ['1560-4292', '1560-4306']

DOI: https://doi.org/10.1007/s40593-022-00289-z